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Abstract: The large spectral bandwidth and wide field of view of the Australian SKA Pathfinder radio telescope 
will open up a completely new parameter space for large extragalactic HI surveys. Here we focus on identifying and 
parametrising HI absorption lines which occur in the line of sight towards strong radio continuum sources. We have 
developed a method for simultaneously finding and fitting HI absorption lines in radio data by using multi-nested 
sampling, a Bayesian Monte Carlo algorithm. The method is tested on a simulated ASKAP data cube, and is shown 
to be reliable at detecting absorption lines in low signal-to-noise data without the need to smooth or alter the data. 
Estimation of the local Bayesian evidence statistic provides a quantitative criterion for assigning significance to a 
detection and selecting between competing analytical line-profile models. 
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1 Introduction 

The Australian Square Kilometre Array Pathfinder's (ASKAP; 
|Deboer et al.'2009'l large spectral bandwidth and wide field of 
view will dramatically improve our ability to conduct large- 
area galaxy surveys in the 21 cm line of neutral hydrogen 
jJohnston et al.|2007> . 

The ASKAP HI All-Sky Survey (WALLABY Science 
Survey Proposal; Koribalski & Staveley-Smith 200^Q will 
cover 75 of the sky (—90 deg < Dec. < +30 deg) at a spa- 
tial resolution of approximately 30 arcsec and velocity reso- 
lution of approximately 4kms~^. With an integration time 
of 8 h per pointing (assuming a system temperature of 50 K) 
the survey will allow us to examine the HI properties and 
large-scale distribution of ~500,000 galaxies out to a red- 
shift of 0.26 (equivalent to a look-back time of approximately 
3Gyr). 

ASKAP will also be a powerful instrument for carrying 
out blind HI absorption-line surveys using background radio 
continuum sources. The advantage of absorption-line sur- 
veys is that their sensitivity depends only on the brightness 
of the background source, making it possible to probe the 
neutral gas content of individual galaxies at redshifts where 
the HI emission line is too weak to be detectable. 

The ASKAP First Large Absorption Survey in HI (FLASH 
Science Survey Proposal; Sadler et al. 2009j will search 
for HI and OH absorption features in two redshift ranges 
(0 < z < 0.26 and 0.5 < z < 1.0) using bright background 
continuum sources from the existing SUMSS ([Mau ch et al.| 
|2003| and NVSS (Condon et al. 19981 catalogues, both of 
which have an angular resolution of 45 arcsec. This amounts 
to a targeted search of over 150,000 sightlines to background 
continuum sources, an increase of more than two orders of 
magnitude over the total number of sightlines probed in all 
previous HI absorption-line surveys with radio telescopes. In 

'http://www.atnf.csiro.au/researchAVALLABY 
^http://www.physics. usyd.edu.au/sifayMain/FLASH 



the lower (0 < z < 0.26) redshift range, the same ASKAP 
data are used for the FLASH and WALLABY surveys, mak- 
ing it possible to cross-compare emission- and absorption- 
line measurements of local galaxies. 

FLASH will search all ASKAP HI data cubes for ab- 
sorption lines at the positions of radio continuum sources 
with flux densities above 50mJy in the 1.4 GHz NVSS and 
843 MHz SUMSS surveys. Since the positions of these back- 
ground continuum sources are already known, the 'source- 
finding problem' for FLASH is reduced to the need for a reli- 
able 'line-finding' algorithm which can be efficiently applied 
at a large number of pre-determined positions on the sky. 
When characterising the detected lines, we want to obtain 
a reliable analytical model of the line profile and distinguish 
between competing models, even in the low signal-to-noise 
(SNR) regime. 

A robust quantitative method of selecting between com- 
peting models, and measuring the significance of a detection, 
is provided through the calculation of the Bayesian evidence 
statistic. This method is already being used for a range of 
other low SNR astrophysical scenarios, including model fit- 
ting to observations of the Sunyaev-Zel'dovich Effect (see 
e.g. [Marshall et al.|[2003| [Feroz et al.|[2009i [AiEson et al.| 
|20I1| |, where we are interested in comparing between com- 
peting models for a redshift-independent observable. 

In this paper we present the application of an existing 
Bayesian Monte Carlo algorithm to the problem of assigning 
significance to the detection and modeling of HI absorption 
lines in a simulated ASKAP data cube. Unless otherwise 
stated, all errors refer to the 68.3 % interval. 

2 Simulated data 

The expected properties of a full ASKAP data cube include a 
30 deg^ field of view and 300 MHz of bandwidth with 1 6,384 
channels, corresponding to an HI velocity resolution of 12 km s 
at 800 MHz. Present computing limitations meant that it was 
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only possible to simulate 1024 spectral channels over the 
full 30deg^ ASKAP field, equating to 18 MHz of spectral 
bandwidth. A simulated ASKAP-FLASH data cube cover- 
ing the redshift range 0.76 < z < 0.792 was released by 
the ASKAP computing group in May 201 1, and details were 
made publically available onlin^ 

The FLASH simulation included both spectral-line and 
continuum information, and the basic steps were: 

1 . Create a realistic continuum sky simulation at 850 MHz, 
using the semi-empirical SKADS simulation by |Wilm^ n 
|et al.| l [2008^ and an assumed integration time of two 
hours per pointing (see Figure[T](. 

2. 'Paint in' a grid of Gaussian HI absorption-line pro- 
files covering a range in velocity full width at half 
maximum (FWHM) and peak optical depth r. S/N 
calculations indicate that only sources stronger than 
about 50 mjy beam" ^ are realistic targets for the FLASH 
survey (with a planned observing time of two hours 
per ASKAP pointing), but sources with flux densi- 
ties down to lOmJybeam"^ had HI absorption lines 
added in the simulation so that our line-finding method 
could be tested in the low S/N regime. 

To provide a useful test of line-finding algorithms, the 
number of absorption lines inserted into the simulated data 
is far higher than the number that we would expect to see 
in a real ASKAP data cube. In total 600 lines (each with 
a single Gaussian profile) were inserted into the simulated 
cube. They spanned a redshift range 0.76 < z < 0.792, 
with optical depths 0.01 < r < 0.30 and velocity widths 
between 5 and 80 km s"^ . Not all of these lines are expected 
to be detectable in the final simulated data cube. 

The continuum and spectral-line datasets were kept sep- 
arate to mimic the effects of continuum subtraction, since the 
capability to do this in the ASKAP pipeline had not yet been 
fully implemented. 



3 Method 

To search for HI absorption from neutral gas in distant galax- 
ies we target sightlines towards known bright continuum sources, 
since the detection probability for an HI absorption line is in- 
dependent of redshift but increases with the brightness of the 
background continuum source. The blind aspect of the sur- 
vey arises from searching for absorption dips in the spectral 
domain, so we do n ot need a 3-dim ensional source finder, 
such as DuCHAMlj^ I Whiting|2d08 Whiting et al. in prep.). 
Instead, we need a tool that detects any spectral lines, quan- 
tifies their properties based on an analytical model, and pro- 
vides an estimate of the detection significance. Standard 
minimisation and residual inspection have previously been 
used to fit parametrised Gaussian models in HI absorption 
surveys (see e.g. [Gupta et al.|20I0[|Kanekar et al.p009) , and 
the analysis outlined in this work uses a generalised exten- 
sion of those methods. In the following sections we discuss a 
Bayesian approach to the one-dimensional line-finding prob- 
lem. 
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Figure 1: Top: A simulated 850MHz continuum 
ASKAP image based on the semi-empirical SKADS 
simulation by Wilman et al.^ (2008) . The maximum 
pixel value has been limited to 0.01 Jy beam"^ to pre- 
vent the brightest sources from dominating the im- 
age. Bottom: An example continuum-subtracted spec- 
trum extracted from the ASKAP-FLASH simulated 
data cube at the position, RA(J2000) = 12'^28'"26!86 
and Dec.(J2000) = -47°03'31'.'50, of a source of 
flux density ^soo = 198.7 mJy. The red lines indicate 
the positions of two HI absorption-line components 
present in the spectrum. Component a is clearly vis- 
ible above the noise level. 



^http://www.atnf.csiro.au/people/Matthew.Whiting/ASKAPsimulations.php 
''http://www.atnf.csiro.au/peopIe/Matthew.Whiting/Duchamp/ 
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3.1 Spectra extraction 

One-dimensional spectra are extracted from tlie simulated 
ASKAP data cube using a scripted PYTHON routine in the 
data reduction package. An input catalogue of the 
435 continuum sources that contain simulated HI absorption 
lines is used to provide the known positions from which the 
spectra are extracted. Extraction is performed at the centre 
position of each source using the task IMVAL in Casa. Each 
source is indexed based on its flux density at 800 MHz, or- 
dered in descending value. Figure[T]shows an example of an 
extracted spectrum from a continuum source, in which there 
are two HI absorption-line components. One of the compo- 
nents is clearly visible by eye at 794.9 MHz (z = 0.787), 
while the other is buried within the noise at 807.0 MHz (z — 
0.760). The spectral data are stored as individual data files 
for each continuum source with information on the frequency, 
brightness and uncertainty. The uncertainties in the data are 
estimated based on the median of the absolute deviations 
from the median value (MADFM). This statistic is a more 
robust estimator of the true uncertainty than the RMS when a 
strong signal is present in the data. For Gaussian distributed 
data, the true standard deviation is estimated by multiplying 
the MADFM statistic by a factor of 1.4826042. The spectra 
are also stored in Flexible Image Transport System (FITS, 
[Wells et al.|I98I| l format, so that they are compatible for use 
with the DUCHAMP source finder. 



3.2 Bayesian inference 

We fit analytical models to the extracted spectral data using 
Bayesian inference. The posterior (or joint) probability for a 
set of model parameters (6), given the data (d) and the model 
hypothesis (A4), can be calculated from Bayes' theroem, 



PT{e\d,M) = 



Pr{d\e,M)Pr{e\M) 
Pr(diM) ■ 



(1) 



The probability of the data given the model parameters, known 
as the likelihood, can be calculated based on assumptions 
about the distribution of the uncertainty in the data. If the 
data set is large and therefore quasi-continuous (such as the 
thermal noise generated in radio instrumentation), one can 
approximate the likelihood by the form given for Gaussian 
multivariate data (see e.g.jSiviaj2006^ 



L = Pr{d\0,M) 
1 



\/(27r)^|C| 



exp 



{d-TnYC-^{d-m) 



(2) 



where is equal to the size of d, C is the covariance matrix 
of the data, jCj is the determinant of the covariance matrix 
and m is the vector of model data. In the special case where 
the variance in the data is a constant (a^) and uncorrelated, 
the above expression for the likelihood reduces to 



L = 



exp 



Ei(di-"^i) 



2cr2 



(3) 



The probability of the parameter values given the model hy- 
pothesis, Pt{0\A4), is often known as the prior probability 
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and encodes information about the parameter values a priori. 
For example, consider the situation where the frequency po- 
sition of an intervening HI absorber has been relatively well 
constrained from previous observations. If we trust these ob- 
servations we might then choose to apply a normal prior to 
the spectral-line position based on the known level of uncer- 
tainty. We would otherwise apply uninformative priors to 
the parameters if we were are previously unaware of their 
value. Uninformative priors are typically uniform in either 
linear space (for location parameters) or logarithmic space 
(for scale parameters, known as Jeffery's prior). 

The normalisation of the posterior probability in Equa- 
tion[T]is equal to the probability of the data given the model 
hypothesis and is referred to throughout this work as the evi- 
dence. The evidence is calculated by marginalising the prod- 
uct of the likelihood and prior distributions over the model 
parameters. This is given by 



E = Pr{d\M) 



= / Pr{d\6,M)Prie\M)dO, 



(4) 



which follows from the relation given by Equation[T|and that 
the integrated posterior is normalised to unity. When the 
model hypothesis provides a good fit to the data the likeli- 
hood peak will have a large value, and hence the model hy- 
pothesis will have a large associated evidence value. How- 
ever if the model is over-complex then there will be large 
regions of low likelihood within the prior volume, thus re- 
ducing the evidence value for this model, in agreement with 
Occam's razor. Estimation of the evidence is often key in 
providing a tool for selecting between competing models. 

3.3 Application to spectral-line finding 

In the approach presented in this work we wish to ask the 
question: Do the data warrant a model hypothesis that in- 
cludes the presence of a spectral-line of a given form, in pref- 
erence to a model with no spectral-lines at all? In the case of 
the simulated ASKAP-FLASH data the underlying signal is 
know to be a single Gaussian component, with all the con- 
tinuum signal perfectly subtracted. Hence we use a spectral- 
line model hypothesis that is given by a single Gaussian of 
the form 



Ih' — -^I'.pcak GXp 



-4 In (2) 



(Au) 



(5) 



where the set of model parameters 9 consist of the character- 
istic peak value /^.poak, the spectral position z^pcak, and the 
FWHM of the spectral line Ai/. We test this single Gaussian 
spectral-line model against the null hypothesis of a model 
containing no spectral line at all. In the case of perfectly 
continuum subtracted data we expect there to be no signal 
(i.e. mi, null = for all i) and so the likelihood of the data 
for the null model reduces to 



1 



: exp 



Ei(rfi) 

2cr2 



(6) 



We are simulating a blind absorption survey and so use 
uninformative priors for all of the parameters in our spectral- 
line model (see Table[T]l. The line-depth prior is set by the 
physical limit of the brightness of each source. We can also 



4 



Publications of the Astronomical Society of Australia 



CU 



10 



-10 



-20 




-30 



-40 



-50 



-60 




Data 

Model 

Residual 



J?i =60.53 ±0.08 



790 



795 



800 

i/(MHz) 



805 



810 



H 0.5 




0.7871 



N 0.787 



0.7869 L 



0.05 0.1 0.15 0.2 





0.05 0.1 0.15 0.2 0.7869 



0.787 



0.7871 



^ 40 



20 








40 



20 







0.05 0.1 0.15 0.2 0.7869 




0.7871 




10 20 30 40 
A V (km s~^) 



50 



Figure 2: Top: Example of spectral-line fitting to a simulated ASKAP source (RA = 12^28™26';86, Dec. = 
— 47°03'31'.'50, 5*800 = 198.7 mJy). One of the absorption-line components is detected , while the other is hidden 
in the simulated noise. The residual has been plotted with an offset from the frequency axis for clarity. Bottom: 
Estimate of the marginialised posterior probabilities for absorption-line parameters from the example detected 
source. The parameters displayed are the peak optical depth (r), redshift (z) and velocity FWHM (Av). The 
grey-scale represents the 68.3, 95.4, and 99.7% intervals. The dashed lines represent the input catalogue values. 
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search for emission by reversing the sign of the line-depth 
prior and instead consider positive values. The spectral posi- 
tion is limited to the range of frequencies recorded by the 
data. The prior range for the FWHM of the spectral line 
correspond to a velocity range of ~ 0.65 - 650kms~^ at 
800 MHz, which are considered to be physically reasonable 
limits. 

We use the ratio of the probabilities for model hypothe- 
ses given the data, 



Table 1 : Model parameter priors 



Pr(Mi|d) _ Pr(d|Xi) Pr(Mi) _ Ei Pr(Mi) 



Pr(7W2|d) Pr{d\M2) Pr(M2) £2 Pr(7W2 



(7) 



to quantify the relative significance of the Guassian spectral- 
line versus no-line model. The ratio Pr(7V/(i)/Pr(Al2) en- 
codes our prior belief that one hypothesis is favoured over 
another. Since we assume no prior information on the pres- 
ence of spectral lines, this ratio is equal to unity and so the 
above selection criterion is then given by the ratio of the evi- 
dences. We define the quantity 



R = \n 



\ -Enull 



(8) 



with values greater than zero indicating the the level of sig- 
nificance for the Guassian spectral-line detection. Values of 
R less than zero indicate that the data do not warrant the 
inclusion of the Guassian spectral-line model over the null 
hypthothesis and so the detection is rejected. 

It should be noted that we haven only chosen to use sin- 
gle Gaussians to parametrize the absorption lines, which for 
the case of the simulated ASKAP-FLASH data is equal to 
the underlying model. However the technique can be used for 
any model parametrization of the spectral-line profile. The 
validity of using more complex models for a given data set 
can be inferred by comparing the successive evidence values. 
Indeed we can follow up a detection using the single Guas- 
sian profile by incrementally increase the number of compo- 
nents and compare the evidence for each model hypothesis 
until a best fit is obtained. The evidence statistic will penalise 
overly complex models and so will likely reach an optimised 
value after a fixed number of components. The quality of the 
best-fit model can also be inferred qualitatively by inspec- 
tion of the residual spectrum. In addition to more complex 
spectral-line models we may also wish to simultaneously fit 
to the continuum spectrum, rather than subtracting a best- 
fit continuum model prior to analysis. In this case we can 
compare a continuum and spectral line to a continuum-only 
model and therefore again infer the presence of spectral lines 
in our data. This has the benefit of correctly propagating 
the uncertainties in the continuum model parameters through 
to the derived marginalised probability distributions of our 
spectral-line model parameters. 

3.4 Implementation 

Bayesian model fitting is imple mented using the existing MULTI- 



NESTp] package developed by 



Feroz & Hobson 



1 2008b and 



Feroz et ar| ( |2009[ l. This software uses nested sampling ll Skilling] 
2004| l to explore parameter space and robustly calculate both 
the posterior probability distribution and the evidence for a 
given likelihood function and prior (provided by the user). 



Parameter 


Prior type 


Prior range 


-^i^,pcak 
^pcak 


log-uniform 
linear-uniform 
log-uniform 


±(0.001 mJy-/8oo) 
790 -810 MHz 
0.001 - 1 MHz 
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We run MultiNest with multi-modal switched on, whereby 
samples are taken of multiple likelihood peaks within param- 
eter space, thus allowing for multiple absorption lines. For 
each peak in likelihood we calculate a local evidence value 
for the single Gaussian model, and then compare it with the 
evidence of a model with no line. The significance of the 
Gaussian-line profile for a given data set, is inferred by the 
relative value of the local evidence compared with i^nuii. If 
the local evidence for the single Gaussian model is less than 
or equal to -Bnuii then this "detection" is rejected. Following 
the successful completion of the nested sampling algorithm, 
both the multi-modal local evidence values and the model pa- 
rameter posterior probability are recorded. In this work we 
use simple Message Passing Interface (MPI) to split the spec- 
tral data across multiple processors, however MultiNest 
has intrinsic MPI capability and the use of this for ASKAP- 
FLASH will be investigated in future work. 

The method described in this work, whereby we infer 
the probability of the spectral model given the data, is a for- 
ward approach to the problem and hence we do not apply a 
smoothing kernel to the spectral data. To do so would intro- 
duce assumptions about the underlying signal in the data and 
therefore introduce false detections into the results, which 
would be indistinguishable from true detections. 

4 Results and discussion 
4.1 Output from the line-finder 

Figure|2] shows an example of the output from line detection 
in a simulated ASKAP spectrum. In this example one of the 
two absorption-line components known to be present in the 
spectrum has been detected above the noise and the posterior 
probability for the Gaussian parameters estimated. The sec- 
ond absorption-line component at 807.0 MHz (z = 0.760), 
while having a relatively wide FWHM of Av = 80kms~^, 
has a low optical depth of r = 0.02 and so was not detected 
above the noise. It is clear from the residual spectrum that no 
other lines are present above the noise level. 

For this example spectrum we calculate that R — 60.53± 
0.07, indicating that the Gaussian-line model is significantly 
favoured. The marginalised posterior probability distribution 
for each parameter is shown in the lower plot in Figure|2]and 
are reasonably Gaussian in shape. The 2-dimensional con- 
tours indicate the correlation between parameters. There is 
no apparent correlation between the peak optical depth and 
redshift, or between the FWHM and redshift. There is some 
anti-correlation between the peak optical depth and FWHM 
of the line, indicating conservation of the integrated optical 
depth. 

It has been noted that the simulated absorption catalogue 
was constructed based on a single Gaussian-line model. How- 
ever when analysing real ASKAP-FLASH data we will have 
to make an assumption about the analytical form of the line 
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profile. Calculation of the Bayesian evidence statistic pro- 
vides us with a global likelihood for selecting between com- 
peting models. If, for example we choose to parametrise the 
data using a Lorentizian-line model (including the same uni- 
form priors used for the Gaussian-line model) then we obtain 
a value of 7i = 56.88 ± 0.07. In this case the evidence again 
rejects the no-line model, but favours the Gaussian-line over 
the Lorentizan-line model. 



4.2 Comparison with input catalogue 

Of the 600 absorption-line components painted onto the 435 
brightest sources in the continuum simulation, 60 are found 
to be at locations off the edge of the main field of the image 
and are therefore discounted from the sample. Of the com- 
ponents located within the image, there are 3 detections with 
R less than unity. These detections have comparable signif- 
icance to the ten false-positive detections and are therefore 
counted as non-detections. Note that the 10 false-positive 
detections (which include 4 in absorption and 6 in emission) 
have very low significance and unphysical velocity widths 
and hence can be distinguished from the correct detections. 

Of the remaining absorption-line components, 76 are de- 
tected above the noise with R greater than unity, yielding a 
detection rate of 14% from a realistic 2-h integration on an 
ASKAP field. 

Figure[3] plots the peak and integrated HI optical depth 
versus the 800 MHz flux density of the background contin- 
uum source for both detected and undetected lines from the 
ASKAP-FLASH simulation The dashed line in the first plot 
shows the detection limit in peak optical depth r originally 
assumed by the FLASH team, based on the 5-sigma detec- 
tion of a line peak in a single 1 8 kHz spectral channel in a 
2-h ASKAP observation. We detect almost all of the sources 
expected to be found in the real ASKAP data, as well as some 
additional weaker lines. The simulation results therefore im- 
ply that the assumed FLASH detection limit is reasonable, 
and may even be slightly conservative. 

The plot of integrated optical depth Tint in Figure [3] also 
shows the existing observational data points for HI absorption- 
lines at 2 < 1 from Table 1 of |Curran et al.| ( |2008^ . This plot 
shows that the FLASH survey should be able to detect simi- 
lar HI absorption-lines against continuum sources which are 
10-100 times fainter than those typically probed in targeted 
HI absorption-line searches with existing radio telescopes. 

Figure|4]compares the estimated and input catalogue val- 
ues for each of the absorption-line model parameters, plotted 
as a function of the continuum source flux density. The red- 
shift position of each line is the most precisely determined 
parameter from model fitting, with differences compared to 
the input catologue ~ 0.01 %. The peak optical depth and 
FWHM parameters are less precisely determined by model 
fitting to the simulated data, with differences ~ 10 %. The 
large majority of parameters are within 1 — 3 u of their ex- 
pected catalogue values. The few significant outliers are likely 
due to the imaging procedure still in development by the 
ASKAP computing group. We extract the spectral data from 
a pixel at the position of the source, and so either pixelisa- 
tion of the input model or imaging artefacts may produce an 
offset in the estimated parameters. 
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Figure 3: The results of running our Bayesian line 
finder on the simulated ASKAP data cube. Red cir- 
cles and blue crosses represent detected and undetected 
lines respectively. Top: Lines of different peak optical 
depth T as a function of the flux density of the back- 
ground continuum source. The dashed line represents 
the cut-off for detectability in a 2 hour ASKAP obser- 
vation as originally estimated by the FLASH team (see 
text). Bottom: The velocity integrated HI optical depth 
over the line. The black open stars show published ob- 
servational data points for radio detections of interven- 
ing HI absorption-lines at redshift z < 1, taken from 
Table 1 of |Curranet aL] ( |2008| l. 
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Figure 4: Comparison between the estimated and true 
values for each absorption-line parameter, as a function 
of the 800 MHz source flux density. Sources located 
outside of the image edge or with R less than unity are 
not included in the sample. The errorbars represent the 
1 (T uncertainty. 



4.3 Comparison with DUCHAMP 

We searched for absorption-line components in the same set 
of 435 simulated ASKAP spectra using the DuCHAMP 3- 
dimensional source finder (for technical details please refer 
to|Whiting 2008). 

This source finder uses an intensity-thresholding algo- 
rithm without assuming any underlying analytical model for 
the source shape or line profile. DuCHAMP is therefore opti- 
mized for detecting complex sources in 3 dimensions rather 
than simultaneous detection and parametric model fitting of 
spectral-line profiles. 

For this work we are considering the spectral domain 
and so we ran DUCHAMP in a mode such that it was opti- 
mized for spectral searches (for example we set the param- 
eters SEARCHType and SMOOTHType to "spectral"). The 
source-finding parameters are optimised so that we minimise 
the number of false positive detections, while maximising 
the number of detected absorption-line components. The 
data are smoothed using a Hanning filter width of 3 and the 
detections are accepted if they are brighter than a threshold 
of 3 a above the mean and contain more than 3 contiguous 
channels. In order to correctly determine the total number of 
false-positive detections we run the program in both emission 
and absorption-line mode. 

Of the 540 absorption-line components located within 
the edge of the image, 63 are correctly detected with the 
DUCHAMP source finder. We obtain 7 false-positive detec- 
tions with 3 in absorption and 4 in emission. One of these 
false positive detections has an unphysical peak optical depth, 
while the other 6 have relatively low SNR and are indistin- 
guishable from the other correctly detected low-SNR absorp- 
tion components. 

Figure|5] shows the velocity-integrated optical depth ver- 
sus the 800 MHz source flux density for detected absorption- 
line components using both the Bayesian line finder and DUCHAMP. 
There are 18 absorption-line components that are correctly 
detected with the Bayesian line finder and not with DUCHAMP, 
including 3 which have R less than unity and are hence re- 
jected due to low significance. Both of the absorption-line 
components that are correctly detected with DUCHAMP and 
not with the Bayesian line finder have low SNRs (less than 3) 
and are therefore difficult to distinguish from the false posi- 
tives. 

Qualitatively, DUCHAMP requires significantly lower com- 
putation time for the 1 -dimensional spectral-line finding prob- 
lem, because calculation of the evidence statistic requires 
Monte-Carlo integration over a multi-dimensional parame- 
ter space (see Equation|4](. However the Bayesian line finder 
provides a more robust method for detecting low-significance 
spectral-lines, estimating the probability distribution of model 
parameters, and selecting between competing analytical mod- 
els. 

5 Conclusions 

We have applied the multi-nested sampling algorithm to sim- 
ulated ASKAP-FLASH data in order to test its usefulness 
in both finding and fitting absorption-line components. This 
Bayesian technique provides us with a robust tool for select- 
ing spectral-line detections in low-SNR data, along the line 
of sight to known continuum sources. The sampling algo- 
rithm is necessarily slower than the DUCHAMP source finder 
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Figure 5: The velocity -integrated optical depth versus 
800 MHz source flux density for detected absorption- 
line components using the Bayesian line finder (Blue 
Circles) and DUCHAMP (Red Crosses). 



because it calculates an estimate of the Bayesian evidence 
statistic, and hence provides us with a method of both as- 
signing significance to our detections and selecting between 
competing models. Our analysis of a simulated ASKAP data 
cube also shows that the line-finding techniques presented in 
this paper can robustly detect HI absorption lines at (and even 
slightly below) the levels originally estimated by the FLASH 
team for a two-hour integration with ASKAP. 
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